REFERRING EXPRESSION GENERATION TOWARDS MEDIATING SHARED PERCEPTUAL BASIS IN SITUATED DIALOGUE By
نویسنده
چکیده
REFERRING EXPRESSION GENERATION TOWARDS MEDIATING SHARED PERCEPTUAL BASIS IN SITUATED DIALOGUE By Rui Fang Situated human-robot dialogue has received increasing attention in recent years. In situated dialogue, robots/artificial agents and their human partners are co-present in a shared physical world. Robots need to automatically perceive and make inference of the shared environment. Due to its limited perceptual and reasoning capabilities, the robot’s representation of the shared world is often incomplete, error-prone, and significantly mismatched from that of its human partner’s. Although physically co-present, a joint perceptual basis between the human and the robot cannot be established. Thus, referential communication between the human and the robot becomes difficult. Robots need to collaborate with human partners to establish a joint perceptual basis, referring expression generation (REG) thus becomes an important problem in situated dialogue. REG is the task of generating referring expressions to describe target objects such that the intended objects can be correctly identified by the human. Although extensively studied, most existing REG algorithms were developed and evaluated under the assumption that agents and humans have access to the same kind of domain information. This is clearly not true in situated dialogue. The objective of this thesis is investigating how to generate referring expressions to mediate mismatched perceptual basis between humans and agents. As a first step, a hypergraph-based approach is developed to account for group-based spatial relations and uncertainties in perceiving the environment. This approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap indicates that when the agent applies traditional approaches (which usually generate a single minimum description) to generate referring expressions to describe target objects, the intended objects often cannot be correctly identified by the human. To address this problem, motivated by collaborative behaviors in human referential communication, two collaborative models are developed an episodic model and an installment model for REG. In both models, instead of generating a single referring expression to describe a target object as in the previous work, it generates multiple small expressions that lead to the target object with a goal to minimize the collaborative effort. In particular, the installment model incorporates human feedback in a reinforcement learning framework to learn the optimal generation strategies. Our empirical results have shown that the episodic model and the installment model outperform our non-collaborative hypergraph-based approach with an absolute gain of 6% and 21% respectively. Lastly, the collaborative models are further extended to embodied collaborative models for facilitate human-robot interaction. These embodied models seamlessly incorporate robot gesture behaviors (i.e., pointing to an object) and human’s gaze feedback (i.e., looking at a particular object) into the collaborative model for REG. The empirical results have shown that when robot gestures and human verbal feedback is incorporated, the new collaborative model achieves over 28% absolute gains compared to the baseline collaborative model. This thesis further discusses the opportunities and challenges brought by modeling embodiment in collaborative referential communication in human-robot interaction.
منابع مشابه
Towards Situated Dialogue: Revisiting Referring Expression Generation
In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perc...
متن کاملREFERENTIAL GROUNDING TOWARDS MEDIATING SHARED PERCEPTUAL BASIS IN SITUATED DIALOGUE By
REFERENTIAL GROUNDING TOWARDS MEDIATING SHARED PERCEPTUAL BASIS IN SITUATED DIALOGUE
متن کاملTowards Mediating Shared Perceptual Basis in Situated Dialogue
To enable effective referential grounding in situated human robot dialogue, we have conducted an empirical study to investigate how conversation partners collaborate and mediate shared basis when they have mismatched visual perceptual capabilities. In particular, we have developed a graph-based representation to capture linguistic discourse and visual discourse, and applied inexact graph matchi...
متن کاملModeling Collaborative Referring for Situated Referential Grounding
In situated dialogue, because humans and agents have mismatched capabilities of perceiving the shared physical world, referential grounding becomes difficult. Humans and agents will need to make extra efforts by collaborating with each other to mediate a shared perceptual basis and to come to a mutual understanding of intended referents in the environment. In this paper, we have extended our pr...
متن کاملA Wizard-of-Oz Environment to Study Referring Expression Generation in a Situated Spoken Dialogue Task
We present a Wizard-of-Oz environment for data collection on Referring Expression Generation (REG) in a real situated spoken dialogue task. The collected data will be used to build user simulation models for reinforcement learning of referring expression generation strategies.
متن کامل